Nuggeteer: Automatic Nugget-Based Evaluation using Descriptions and Judgements
نویسندگان
چکیده
The TREC Definition and Relationship questions are evaluated on the basis of information nuggets that may be contained in system responses. Human evaluators provide informal descriptions of each nugget, and judgements (assignments of nuggets to responses) for each response submitted by participants. While human evaluation is the most accurate way to compare systems, approximate automatic evaluation becomes critical during system development. We present Nuggeteer, a new automatic evaluation tool for nugget-based tasks. Like the first such tool, Pourpre, Nuggeteer uses words in common between candidate answer and answer key to approximate human judgements. Unlike Pourpre, but like human assessors, Nuggeteer creates a judgement for each candidatenugget pair, and can use existing judgements instead of guessing. This creates a more readily interpretable aggregate score, and allows developers to track individual nuggets through the variants of their system. Nuggeteer is quantitatively comparable in performance to Pourpre, and provides qualitatively better feedback to developers.
منابع مشابه
Improving Complex Interactive Question Answering with Wikipedia Anchor Text
When the objective of an information retrieval task is to return a nugget rather than a document, query terms that exist in a document will often not be used in the most relevant information nugget in the document. In this paper, a new method of query expansion is proposed based on the Wikipedia link structure surrounding the most relevant articles selected automatically. Evaluated with the Nug...
متن کاملComparing Automatic Evaluation Measures for Image Description
Image description is a new natural language generation task, where the aim is to generate a human-like description of an image. The evaluation of computer-generated text is a notoriously difficult problem, however, the quality of image descriptions has typically been measured using unigram BLEU and human judgements. The focus of this paper is to determine the correlation of automatic measures w...
متن کاملIIT Kharagpur at TAC 2009: Statistical and Nugget-based Model for Automatic Summary Evaluation
In this paper we present our participation at TAC 2009 AESOP (Automatically Evaluating Summaries Of Peers) task. We make use of a statistical model for evaluation metric correlating to Overall Responsiveness and a nugget-based pyramid model for correlating to the Pyramid manual metric of TAC 2009. We also present the performance of our three submitted runs as per the official TAC 2009 evaluatio...
متن کاملTowards Succinct and Relevant Image Descriptions
What does it mean to produce a good description of an image? Is a description good because it correctly identifies all of the objects in the image, because it describes the interesting attributes of the objects, or because it is short, yet informative? Grice’s Cooperative Principle, stated as “Make your contribution such as is required, at the stage at which it occurs, by the accepted purpose o...
متن کاملImprovement of generative adversarial networks for automatic text-to-image generation
This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed ...
متن کامل